16 research outputs found

    Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm

    Get PDF
    We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/

    A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation.

    Get PDF
    As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems

    Dynamic Facial Landmarking Selection for Emotion Recognition using Gaussian Processes

    Get PDF
    Facial features are the basis for the emotion recognition process and are widely used in affective computing systems. This emotional process is produced by a dynamic change in the physiological signals and the visual answers related to the facial expressions. An important factor in this process, relies on the shape information of a facial expression, represented as dynamically changing facial landmarks. In this paper we present a framework for dynamic facial landmarking selection based on facial expression analysis using Gaussian Processes. We perform facial features tracking, based on Active Appearance Models for facial landmarking detection, and then use Gaussian process ranking over the dynamic emotional sequences with the aim to establish which landmarks are more relevant for emotional multivariate time-series recognition. The experimental results show that Gaussian Processes can effectively fit to an emotional time-series and the ranking process with log-likelihoods finds the best landmarks (mouth and eyebrows regions) that represent a given facial expression sequence. Finally, we use the best ranked landmarks in emotion recognition tasks obtaining accurate performances for acted and spontaneous scenarios of emotional datasets

    Transcriptional, epigenetic and metabolic signatures in cardiometabolic syndrome defined by extreme phenotypes

    Get PDF
    This is the final version. Available on open access from BMC via the DOI in this recordAvailability of data and materials: The datasets generated during this study are available at EGA under study ID EGAS00001003780. The codes generated during this study and all supplementary tables are available at GitLab https://gitlab.com/dseyres/extremephenotype.Background: This work is aimed at improving the understanding of cardiometabolic syndrome pathophysiology and its relationship with thrombosis by generating a multi-omic disease signature. Methods/Results: We combined classic plasma biochemistry and plasma biomarkers with the transcriptional and epigenetic characterisation of cell types involved in thrombosis, obtained from two extreme phenotype groups (morbidly obese and lipodystrophy) and lean individuals to identify the molecular mechanisms at play, highlighting patterns of abnormal activation in innate immune phagocytic cells. Our analyses showed that extreme phenotype groups could be distinguished from lean individuals, and from each other, across all data layers. The characterisation of the same obese group, six months after bariatric surgery revealed the loss of the abnormal activation of innate immune cells previously observed. However, rather than reverting to the gene expression landscape of lean individuals, this occurred via the establishment of novel gene expression landscapes. Netosis and its control mechanisms emerge amongst the pathways that show an improvement after surgical intervention. Conclusions: We showed that the morbidly obese and lipodystrophy groups, despite some differences, shared a common cardiometabolic syndrome signature. We also showed that this could be used to discriminate, amongst the normal population, those individuals with a higher likelihood of presenting with the disease, even when not displaying the classic features.British Heart FoundationMedical Research Council (MRC)Wellcome TrustNational Institute for Health Research (NIHR)Isaac Newton fellowshipJohn and Lucille Van Geest Foundatio

    Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen

    Get PDF
    The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca’s large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.Peer reviewe

    Model selection in systems biology depends on experimental design.

    Get PDF
    Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model's predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis

    MEANS: python package for Moment Expansion Approximation, iNference and Simulation.

    Get PDF
    MOTIVATION: Many biochemical systems require stochastic descriptions. Unfortunately these can only be solved for the simplest cases and their direct simulation can become prohibitively expensive, precluding thorough analysis. As an alternative, moment closure approximation methods generate equations for the time-evolution of the system's moments and apply a closure ansatz to obtain a closed set of differential equations; that can become the basis for the deterministic analysis of the moments of the outputs of stochastic systems. RESULTS: We present a free, user-friendly tool implementing an efficient moment expansion approximation with parametric closures that integrates well with the IPython interactive environment. Our package enables the analysis of complex stochastic systems without any constraints on the number of species and moments studied and the type of rate laws in the system. In addition to the approximation method our package provides numerous tools to help non-expert users in stochastic analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/theosysbio/means CONTACTS: [email protected] or [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm.

    No full text
    We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/
    corecore